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ABSTRACT 

Prediction models that capture and use the structure 
of state-space dynamics can be very effective. In prac¬ 
tice, however, one rarely has access to full information 
about that structure, and accurate reconstruction of the 
dynamics from scalar time-series data—e.g., via delay- 
coordinate embedding—can be a real challenge. In this 
paper, we show that forecast models that employ incom¬ 
plete embeddings of the dynamics can produce surpris¬ 
ingly accurate predictions of the state of a dynamical sys¬ 
tem. In particular, we demonstrate the effectiveness of a 
simple near-neighbor forecast technique that works with 
a two-dimensional embedding. Even though correctness 
of the topology is not guaranteed for incomplete recon¬ 
structions like this, the dynamical structure that they 
capture allows for accurate predictions—in many cases, 
even more accurate than predictions generated using a 
full embedding. This could be very useful in the context 
of real-time forecasting, where the human effort required 
to produce a correct delay-coordinate embedding is pro¬ 
hibitive. 


LEAD PARAGRAPH 

Prediction models constructed from state-space dy¬ 
namics have a long and rich history, dating back to 
roulette and beyond. A major stumbling block in the 
application of these models in real-world situations is the 
need to reconstruct the dynamics from scalar time-series 
data—e.g., via delay-coordinate embedding. This pro¬ 
cedure, which is the topic of a large and active body of 
literature, involves estimation of two free parameters: the 
dimension m of the reconstruction space and the delay, r, 
between the observations that make up the coordinates in 
that space. Estimating good values for these parameters 
is not trivial; it requires the proper mathematics, atten¬ 
tion to the data requirements, computational effort, and 
expert interpretation of the results of the calculations. 
This is a major challenge if one is interested in real-time 
forecasting, especially when the systems involved operate 
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on fast time scales. In this paper, we show that the full 
effort of delay-coordinate embedding is not always nec¬ 
essary when one is building forecast models, and can in¬ 
deed be overkill. Using synthetic time-series data gener¬ 
ated from the Lorenz-96 atmospheric model and real data 
from a computer performance experiment, we demon¬ 
strate that a two-dimensional embedding of scalar time- 
series data from a dynamical system gives simple forecast 
methods enough traction to generate accurate predic¬ 
tions of the future course of those dynamics—sometimes 
even more accurate than predictions created using the 
full embedding. Since incomplete embeddings do not 
preserve the topology of the full dynamics, this is inter¬ 
esting from a mathematical standpoint. It is also poten¬ 
tially useful in practice. This reduced-order forecasting 
strategy involves only one free parameter (r), good val¬ 
ues for which, we believe, can be estimated ‘on the fly’ 
using information-theoretic and/or machine-learning al¬ 
gorithms. As such, it sidesteps much of the complexity of 
the embedding process—perhaps most importantly, the 
need for expert human interpretation—and thus could 
enable automated, real-time dynamics-based forecasting 
in practical applications. 

I. INTRODUCTION 

Complicated nonlinear dynamics are ubiquitous in nat¬ 
ural and engineered systems. Methods that capture and 
use the state-space structure of a dynamical system are 
a proven strategy for forecasting the behavior of systems 
like this, but the task is not straightforward. The govern¬ 
ing equations and the state variables are rarely known; 
rather, one has a single (or perhaps a few) series of scalar 
measurements that can be observed from the system. It 
can be a challenge to model the full dynamics from data 
like this, especially in the case of forecast models, which 
are only really useful if they can be constructed and ap¬ 
plied on faster time scales than those of the target system. 
While the traditional state-space reconstruction machin¬ 
ery is a good way to accomplish the task of modeling 
the dynamics, it is problematic in real-time forecasting 
because it generally requires input from and interpreta¬ 
tion by a human expert in order to work properly. The 
strategy suggested in this paper sidesteps that roadblock 
by using a reduced-order variant of delay-coordinate em¬ 
bedding to build forecast models for nonlinear dynamical 
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systems. 

Modern approaches to modeling a time series for the 
purposes of forecasting arguably began with Yule’s work 
on predicting the annual number of sunspots 1 ^ through 
what is now known as autoregression. Before this, time- 
series forecasting was done mostly through simple global 
extrapolation's. Global linear methods, of course, are 
rarely adequate when one is working with nonlinear dy¬ 
namical systems; rather, one needs to model the de¬ 
tails of the state-space dynamics in order to make ac¬ 
curate predictions. The usual first step in this process 
is to reconstruct that dynamical structure from the ob¬ 
served data. The state-space reconstruction techniques 
proposed by Packard et aM in 1980 were a critical break¬ 
through in this regard. In the most common variant of 
this now-classic approach, one constructs a set of vec¬ 
tors Xj £ R m where each coordinate is simply a time- 
delayed element of the scalar time-series data Xj, i.e., 
Xj = [xj Xj— T ... Xj_r m _ i) T ] for r > 0. In 1981, Takens 
showed that this delay-coordinate embedding method pro¬ 
vides a topologically correct representation of a nonlinear 
dynamical system if a specific set of theoretical assump¬ 
tions are satisfied^ in 1991, Sauer et al. extended this 
discussion and relaxed some of the theoretical restric- 
tiond^l. This remains a highly active field of research; 
see, for example, AbarbaneP or Kantz & Schreiber 7 for 
surveys. 

A large number of creative strategies have been de¬ 
veloped for using the state-space structure of a dynami¬ 
cal system to generate predictions (e. g.\ 2IMH1) Perhaps 
the most simple of these is the “Lorenz Method of Ana¬ 
logues” (LMA), which is essentially nearest-neighbor pre¬ 
diction 13 . Lorenz’s original formulation of this idea used 
the full system state space; this method was extended to 
embedded dynamics by Pikovskjff^, but is also related to 
the prediction work of Sugihara & Majff^, among oth¬ 
ers. Even this simple strategy—which, as described in 
more detail in Section |II B[ builds predictions by look¬ 
ing for the nearest neighbor of a given point and tak¬ 
ing that neighbor’s observed path as the forecast—works 
quite well for forecasting nonlinear dynamical systems. 
LMA and similar methods have been used successfully 
to forecast measles and chickenpox outbreaks 3 C, marine 
phytoplankton populationJ-P performance dynamics of a 
running computer 14 16 ', the fluctuations in a far-infrared 
lasePESI, and many more. 

The reconstruction step that is necessary before these 
methods can be applied to scalar time-series data, how¬ 
ever, can be problematic. Delay-coordinate embedding 
is a powerful piece of machinery, but estimating values 
for its two free parameters, the time delay r and the di¬ 
mension m, is not trivial. A large number of heuristics 
have been proposed for this task (e.g. lIimMl), but these 
methods, described in more detail in Section |II A[ are 
computationally intensive and they require input from— 
and interpretation by—a human expert. This can be a 
real problem in a prediction context: a millisecond-scale 
forecast is not useful if it takes seconds or minutes to 


produce. And it is even more of a problem in nonstation¬ 
ary systems, since the reconstruction machinery is only 
guaranteed to work for an infinitely long noise-free obser¬ 
vation of a single dynamical system. If effective forecast 
models are to be constructed and applied in a manner 
that outpaces the dynamics of the target system, then, 
one may not be able to use the full, traditional delay- 
coordinate embedding machinery to reconstruct the dy¬ 
namics. 


The goal of the work described in this paper was to 
sidestep that problem by developing prediction strate¬ 
gies that work in incomplete embedding spaces. The 
conjecture that forms the basis for our work is that a 
full formal embedding, although mandatory for detailed 
dynamical analysis, is not necessary for the purposes of 
prediction. As a first step towards validating that con¬ 
jecture, we constructed two-embeddings from a number 
of different time-series data sets, both simulated and ex¬ 
perimental, and then built forecast models in that space. 
Sections III and IV of this paper present and discuss those 
results in some detail. In short, we found that forecasts 
produced using the Lorenz method of analogues on a 
two-dimensional delay-coordinate embedding are roughly 
as accurate as—and often even more accurate than— 
forecasts produced by the same method working in the 
full embedding space. Figure [l] shows a quick example: a 
pair of forecasts of the so-called “Dataset A”, a time se¬ 
ries from a far-infrared laser from the Santa Fe Institute 
prediction competition^, generated with LMA on full and 
2D embeddings. 


The main point of Figure [l] -and the main claim of 
this paper—concerns the similarity of the two panels. 
The forecast generated in the full embedding space is 
not much more accurate than the one generated in the 
two-dimensional reconstruction. The errors between true 
and predicted values were 0.117 and 0.148, respectively, 
as measured using the assessment procedure and error 
metric covered in Section |II B| a better choice of r, as 
described in Section |IV[ brings the latter value down 
to 0.119. That is, even though the low-dimensional re¬ 
construction is not completely faithful to the underly¬ 
ing dynamics, it appears to be good enough to support 
accurate forecast models of nonlinear dynamics. Both 
of these LMA-based forecasts, incidentally, significantly 
outperformed traditional strategies like random-walk (by 
a factor of « 8.5) and autoregressive integrated moving 
average (by a factor of ss 6.5). 

The results in Sections [Til] and ||V] offer a deeper vali¬ 
dation of the claim that the full complexity (and effort) 
of the delay-coordinate ‘unfolding’ process may not be 
strictly necessary to the success of forecast models of 
real-world nonlinear dynamical systems. In effect, the 
reduced-order modeling strategy that we propose is a 
kind of balance of a tradeoff between the power of the 
state-space reconstruction machinery and the effort re¬ 
quired to use it. Fixing m = 2 effectively avoids all of the 
in-depth, post-facto, data-dependent analysis that is re¬ 
quired to properly estimate a value for this parameter— 
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time (j) time (j) 

(a) LMA in the full embedding space (“fnn-LMA”) (b) LMA in a two-dimensional embedding space (“ro-LMA”) 

FIG. 1: Forecasts of SFI data set A using Lorenz’s method of analogues in (a) full and (b) 2D reconstructions of the 
state-space dynamics. Blue os are the true continuation Cj of the time series and red xs ( pj ) are the forecasts; black 
error bars are provided if there is a discrepancy between the two. Embeddi ng p arameter values were estimated using 
standard techniques: the first minimum of the average mutual information 23 for the delay in both images and the 
false near neighbor method of Kennel et alP3, with a threshold of 20%, for the dimension in the left-hand image. 
Even though the 2 D embedding used in (b) is not faithful to the underlying topology, it enables successful 

forecasting of the time series. 


which is arguably the harder part of the process. It 
also avoids the high computational complexity that is in¬ 
volved in near-neighbor searches in high-dimension state 
spaces—an essential step in almost any forecast strategy. 
Of course, there is still a free parameter: the delay r. 
As described in Section m though, we believe that good 
values for this parameter can be estimated ‘on the fly,’ 
with little to no human guidance, which would make this 
method both agile and resilient. 

No forecast model will be ideal for every task. In 
fact, as a corollary of the undecidability of the halting 
problenPH, no single forecasting schema is ideal for all 
noise-free deterministic signals 2 let alone all real-world 
time-series data sets. We do not want to give the im¬ 
pression that the strategy proposed here will be effective 
for every time series, but we do intend to show that it 
is effective for a broad spectrum of signals. Additionally, 
we want to emphasize that it is a short -term forecasting 
scheme. Dimensional reduction is a double-edged sword; 
it enables on-the-fly forecasting by eliminating a difficult 
estimation step, but it effects a guaranteed information 
loss in the model. This well-known effeclP all but guar¬ 
antees problems with accuracy as prediction horizons are 


increased. We explore this limitation in Section III A 


II. BACKGROUND AND METHODS 
A. Delay-Coordinate Embedding 

The process of collecting a time series {xj}^ x from a 
dynamical system (aka a “trace”) is formally the eval¬ 
uation of an observation function h : X —> ffi. at the 
true system state y(tj) at time tj for j = 1 ie., 

Xj = h(y(tj)) for j = 1,..., N% Provided that the un¬ 


derlying dynamics $ and the observation function are 
both smooth and generic, TakensP proves that the delay 
coordinate map: 

F(h,®,T,m)(y(tj)) = ([xj Xj- T ... x j _ (m _ 1)T \ T ) (1) 

from a d-dimensional smooth compact manifold M to 
R 2d+1 is a diffeomorphism on M. To assure topological 
conjugacy, the proof requires that the embedding dimen¬ 
sion to must be at least twice the dimension d of the am¬ 
bient space; a tighter bound of to > 2 d cap , the capacity 
dimension of the original dynamics, was later established 
by Sauer et alP. Operationalizing either of these theoret¬ 
ical constraints can be a real challenge, d is not known 
and accurate d cap calculations are not easy with exper¬ 
imental data. The theoretical constraints on the time 
delay are less stringent: r must be gre ater than zero and 
not a multiple of any orbit’s period 4 -A In practice, how¬ 
ever, the noisy and finite-precision nature of digital data 
and floating-point arithmetic combine to make the choice 
of r much more delicateP. 

As mentioned in the previous section, the forecast 
strategy proposed in this paper sidesteps the challenge 
of estimating the embedding dimension by fixing to = 2. 
Selection of a value for the remaining free parameter, r, 
is still an iss ue, though. There are dozens of methods 
for this— e g IZUlHl ll. I n t his paper, we use the method of 
mutual informatior$2i23i^ j n which r is chosen at the first 
minimum of the time-delayed mutual information, calcu¬ 
lated using the TISEAN packages. Fraser & Swinney ar¬ 
gue that this minimizes the redundancy of the embedding 
coordinates, thereby maximizing the information content 
of the overall delay vector 23 . This choice is discussed and 
empirically verified by Liebert and Schustei^, alt hough 
agreement on this topic is by no means universa l 30 ! 31 ! 
And it is well known that choice of r is application- 
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and system-specific: a r that works well for a Lya¬ 
punov e xponen t calculation may not work well f or o ther 
purpose^ 7 1 20 1 321 . Indeed, as we show in Section |IV[ the 
r value suggested by mutual information calculations is 
rarely the optimal choice for our reduced-order forecast 
strategy. Even so, it is a reasonable starting point, as it 
is the standard technique used in the dynamical systems 
community. 

The embedding dimension to is not a parameter in 
our reduced-order forecasting strategy. Since full embed¬ 
dings are the point of departure for the central premise of 
this paper, however—and the point of comparison for our 
results—we briefly review methods used to estimate val¬ 
ues for that parameter. As in the case of r, a number of 
creative strategies f or do ing so have been developed over 
the past few decaded^-- 3 Most of these are based 

in some way on minimizing the number of false crossings 
that are caused by projection. In this paper, we use the 
false near neighbor (FNN) approach of Kennel et aI 2 ^, 
calculated using TISEAlP^with a « 20% threshold on the 
percentage of neighbors. (Whenever we refer to a “full” 
embedding, as in the discussion of Figure[l] we mean that 
the to value was estimated in this fashion and the r value 
was chosen at the first minimum of the mutual informa¬ 
tion, as described in the previous paragraph.) Again, 
no heuristic method is perfect, but FNN is arguably the 
most widely used method for estimating to, and thus is 
useful for the purposes of comparison. 

Finally, it should be noted that there is an alternative 
view that one should estimate t and to together, not 
separatehiraimi. 


of x n in the reconstructed space—namely —and 

maps that vector forward using the delay map, obtaining 

x j(\,rn)+l \. x j( l,m)+l — r (m— l)r] 

Using the neighbor’s image, one defines 

Pn+1 + l X n+1—T • ■ ■ x n+l — (m— l)r] 

LMA defines the forecast of x n +i as p n +i = x j(i, m )+i- 
If performing multi-step forecasts, one appends the new 
delay vector 

Pn+1 = [^ 7(1 ,m) + l X n -\-i — T . . . (m— l)r] 


to the end of the trajectory and repeats this process as 
needed. 

Many more-complicated variants of this algorithm 
have appeared in the literature ( e.g. Emm), most of 
which involve building some flavor of local-linear mod¬ 
els around each delay vector and then using it to make 
the prediction of the next point. Here, we use the basic 
version, in two ways: first—as a baseline for comparison 
purposes—on a “full” embedding of each time series, with 
to chosen using the false near neighbor method of Kennel 
et a I 25 on that data; second, with to = 2. In the rest of 
this paper, we will refer to these as fnn-LMA and ro-LMA, 
respectively. The experiments reported in Section [ITT] use 
the same r value for both fnn-LMA and ro-LMA, choosing 
it at the first minimum of the time-dela yed mutual in¬ 
formation of the time series^. In Section IV we explore 
the effects of varying r on the accuracy of both methods. 


B. Lorenz Method of Analogues 


C. Assessing Forecast Accuracy 


As mentioned in Section[I| the dynamical systems com¬ 
munity has developed a number of methods that capture 
and use state-space structure to create forecasts. Since 
the goal of this paper is to offer a proof of concept of 
the notion that incomplete embeddings could give these 
kinds of methods enough traction to generate useful pre¬ 
dictions, we chose one of the oldest and simplest members 
of that family: Lorenz’s method of analogues (LMA). In 
future work, we will explore reduced-order versions of 
other forecast strategies. 

To apply LMA to a scalar time-series data set {x ? }" =1 , 
one begins by performing a delay-coordinate embedding 
using one or more of the heuristics presented in Sec¬ 
tion IIA to choose to and r. This produces a trajectory 
of the form: 


{xj = [a 


- • • x j — (m— 



\ n 

J j— 1— (m— 1)t 


Forecasting the next point in the time series, x n +i, 
amounts to reconstructing the next delay vector x n +i in 
the trajectory. Note that, by the form of delay-coordinate 
vectors, all but the first coordinate of x n+ i are known. To 
choose the first coordinate, one finds the nearest neighbor 


To evaluate ro-LMA and compare it to fnn-LMA, we 
calculate a figure of merit in the following way. We split 
each ./V-point time series into two pieces: the first 90%, 
referred to as the “initial training” signal and denoted 
{ x j}y= D and the last 10%, known as the “test” signal 

{c j }f=Xi ) ~ N ■ We use initial training signal to build 
the model, following the procedures described in the pre¬ 
vious section. We use that model to generate a prediction 
Pn+i of the value of x n +i, then compare p n +i to the true 
continuation, c n+ The initial investigations that are 
reported in Section III involve “one-step” models, which 
are rebuilt after each step, out to the end of the test sig¬ 
nal, using {c„+ 1 } U { Xj }" =1 . In Section 


IV 


we extend 

this conversation to longer prediction horizons. 

As a numerical measure of prediction accuracy, we cal¬ 
culate the mean absolute scaled error ( A'lASE'f ® be¬ 
tween the true and predicted signals: 


k-\-7i ~\-1 I I 

MASE = V —r - ~ - 

MASE is a normalized measure: the scaling term in the 
denominator is the average in-sample forecast error for a 
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random-walk prediction—which uses the previous value 
in the observed signal as the forecast—calculated over 
the initial training signal {a;'j }" =] . That is, MASE < 1 
means that the prediction error in question was, on the 
average, smaller than the in-sample error of a random- 
walk forecast on the same data. Analogously, MASE > 
1 means that the corresponding prediction method did 
worse , on average, than the random-walk method. The 
MASE value of 0.117 for Figure [TJ for instance, means 
that the fnn-LMA forecast of the SFI data set A was (J y [7 
times better than a random-walk forecast of the initial 
training portion of same signal. 

While its comparative nature may seem odd, this error 
metric allows for fair comparison across varying methods, 
prediction horizons, and signal scales, making it a stan¬ 
dard error measure in the forecasting literature—and a 
good choice for the study described in the following sec¬ 
tions, which involve a number of very different signals. 

III. PREDICTION IN PROJECTION 

In this section, we demonstrate that the accuracies of 
forecasts produced by ro-LMA -Lorenz’s method of ana¬ 
logues, operating on a two-dimensional embedding of a 
trajectory from a dynamical system—are similar to and 
often better than forecasts produced by fnn-LMA, which 
operates on a full reconstruction of the same dynamics. 
While the brief example in Section [T] is a useful first vali¬ 
dation of that statement, it does not support the kind of 
exploration that is necessary to properly evaluate a new 
forecast method, especially one that violates the basic 
tenets of delay-coordinate embedding. The SFI data set 
A is a single trace from a single system. We want to show 
that ro-LMA is comparable to or better than fnn-LMA for 
a range of systems and parameter values—and to repeat 
each experiment for a number of different trajectories 
from each system. To this end, we studied two dynam¬ 
ical systems, one simulated and one real: the Lorenz-96 
model and sensor data from a laboratory experiment on 
computer performance dynamics. 

A. A Synthetic Example: Lorenz-96 

The Lorenz-96 model 35 is a set of K first-order differ¬ 
ential equations relating the K state variables 

ik = (£fc+i - £fc-2)(£fc-i) - + F (2) 

for k = 1 where F £ K. is a constant forcing 

term that is independent of k. In this model, each 
is some atmospheric quantity (such as temperature or 
vorticity) at a discrete location on a periodic lattice rep¬ 
resenting a latitude circle of the earth®!. This model ex¬ 
hibits a wide range of dynamical behavior—everything 
from fixed points and periodic attractors to low- and 
high-dimensional chaos®!—making it an ideal test case 
for our purposes. 


We performed two sets of forecasting experiments with 
traces from the Lorenz-96 model: one with K = 22 and 
the other with K = 47. Both experiments used con¬ 
stant forcing values of F = 5. These choices yield chaotic 
trajectories with low and high Kaplan-Yorke (Lyapunov) 
dimension 37 : dxy 3 for the K = 22 dynamics and 
cIky ~ 19 for K = 41®. Following standard practic^®, 
we enforced periodic boundary conditions and solved 
equation (J2]) from several randomly chosen initial condi¬ 
tions using a standard fourth-order Runge-Kutta integra¬ 
tor for 60,000 steps with a step size of A normalized time 
units. We then discarded the first 10,000 points of that 
trajectory in order to eliminate transient behavior. Fi¬ 
nally, we created scalar time-series traces by individually 
“observing” each of the K state variables of the trajec¬ 
tory: i.e., hi(£i(tj )) = Xjj for j e {10,000,..., 60,000} 
and for i £ {1 We repeated all of this from 

a number of different initial conditions—seven for the 
K = 47 Lorenz-96 system and 15 for the K = 22 case— 
producing a total of 659 traces for our forecasting study. 
For each of these, we used the procedures outlined in Sec¬ 
tion [TTA] to estimate values for the free parameters of the 
embedding process, obtaining m = 8 and r = 26 for all 
traces in the K = 22 case, and m = 10 and r = 31 for 
the K = 47 traced®. 

For the K = 22 dynamics, both ro-LMA and fnn-LMA 
worked quite well. See Figure[2](a) for a time-domain plot 
of an ro-LMA forecast of a representative trace from this 
system and Figures[2](b) and (c) for graphical representa¬ 
tions of the forecast accuracy on that trace for both meth¬ 
ods. The diagonal structure on the pj vs. Cj plots in the 
Figure indicates that both of these LMA-based methods 
performed very well on this trace. More importantly— 
from the standpoint of evaluation of our primary claim— 
the MASE scores of ro-LMA and fnn-LMA forecasts, com¬ 
puted following the procedures described in Section [II C[ 
were 0.391 ±0.016 and 0.441 ±0.033, respectively, across 
the 330 traces at this parameter value. That is, the LMA 
forecasting strategy worked better on a two-dimensional 
embedding of these dynamics than on a full embedding, 
and by a statistically significant margin. This is some¬ 
what startling, given that the two-embedding is not topo¬ 
logically correct. Clearly, though, it captures enough 
structure to allow LMA to generate good predictions. 
And ro-LMA’s reduced-order nature may actually miti¬ 
gate the impact of noise effects, simply because a sin¬ 
gle noisy point in a scalar time series affects m of the 
points in an m-embedding. Of course, r choice, informa¬ 
tion content, and/or data length could also be at work in 
these results; these concerns are addressed in Sections [TV] 
and El 

The K = 47 case is a slightly different story: ro-LMA 
still outperformed fnn-LMA, but not by a statistically sig¬ 
nificant margin. The MASE scores across all 329 traces 
were 0.985 ± 0.047 and 1.007 ± 0.043 for ro-LMA and 
fnn-LMA, respectively. In view of the higher complex¬ 
ity of the state-space structure of the K = 47 version 
of the Lorenz-96 system, the overall increase in MASE 
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(a) 5,000-point forecast using the reduced-order forecast 
method ro-LMA. Blue circles and red xs are the true and 
predicted values, respectively; vertical bars show where these 
values differ. 




FIG. 2: ro-LMA and fnn-LMA forecasts of a 
representative trace from the Lorenz-96 system with 
K = 22 and F = 5. Top: a time-domain plot of the first 
5,000 points of the ro-LMA forecast. Bottom: the 
predicted (pj ) vs true (cj) values for forecasts of that 
trace generated by (b) ro-LMA and (c) fnn-LMA. On 
such a plot, a perfect prediction would lie on the 
diagonal. The MASE scores of the forecasts in (b) and 
(c) were 0.392 and 0.461, respectively. 


scores over the K = 22 case makes sense. Recall that 
<1ky is far higher for the K = 47 case: this attractor 
fills more of the state space and has many more dimen¬ 
sions that are associated with positive Lyapunov expo¬ 
nents. This has obvious implications for predictability. 
It could very well be the case that more data are neces¬ 
sary to reconstruct these dynamics, but our experiments 
held the length of the training set constant across all of 
the Lorenz-96 experiments. These issues are described at 
more length in Section [V] Regardless, it is encouraging 
that the reduced-order forecasting method still performs 
as well as the version that works with a fully ‘unfolded’ 
embedding. 


B. Experimental Data: Computer Performance Dynamics 

Validation with synthetic data are an important first 
step in evaluating any new forecast strategy, but exper¬ 
imental time-series data are the acid test if one is inter¬ 
ested in real-world applications. Our second set of tests 
of ro-LMA, and comparisons of its accuracy to that of 
fnn-LMA, involved data from a laboratory experiment on 
computer performance dynamics. Like Lorenz-96, this 


system has been shown to exhibit a range of interesting 
deterministic dynamical behavior , fro m periodic orbits to 
low- and high-dimensional chao^2M2j ; making it a good 
test case for this paper. It also has important practical 
implications; these dynamics, which arise from the de¬ 
terministic, nonlinear interactions between the hardware 
and the software, have profound effects on execution time 
and memory use. 

Collecting observations of the performance of a run¬ 
ning computer is not trivial. We used the libpfm4 
library, via PAPI (Performance Application Program¬ 
ming Interface) 5.2“ to stop program execution at 
100,000-instruction intervals—the unit of time in these 
experiments—and read the contents of the microproces¬ 
sor’s onboard hardware performance monitors, which we 
had programmed to observe important attributes of the 
system’s dynamics. Se^l for an in-depth description of 
this experimental setup. The signals that are produced 
by this apparatus are scalar time-series measurements of 
system metrics like processor efficiency {e.g., IPC, which 
measures how many instructions are being executed, on 
the average, in each clock cycle) or memory usage (e.g., 
how often the processor had to access the main memory 
during the measurement interval). 

We have tested ro-LMA on traces of many different pro¬ 
cessor and memory performance metrics gathered during 
the execution of a variety of programs on several different 
computers. Here, for conciseness, we focus on processor 
performance traces from two different programs, one sim¬ 
ple and one complex, running on the same Intel i7-based 
computer. Figure [3ja) shows a small segment of an IPC 
time series gathered from that computer as it executed a 
four-line C program (coljnajor) that repeatedly initial¬ 
izes a 256 x 256 matrix in column-major order. On the 
scale of this figure, these dynamics appear to be largely 
periodic, but they are actually chaotic®!. The bottom 
panel in Figure [3] shows a processor efficiency trace from 
a much more complex program: the 403. gcc compiler 
from the SPEC 2006CPU benchmark suite 43 . 

Since computer performance dynamics result from a 
composition of hardware and software, these two pro¬ 
grams represent two different dynamical systems, even 
though they are running on the same computer. The dy¬ 
namical differences are visually apparent from the traces 
in Figure [3] they are mathematically apparent from non¬ 
linear time-series analysis of embeddings of those data'®!, 
as well as in calculations of the information content of the 
two signals. Among other things, 403. gcc has much less 
predictive structure than col_major and is thus much 
harder to forecast!-^. These attributes make this a useful 
pair of experiments for an exploration of the utility of 
reduced-order forecasting. 

For statistical validation, we collected 15 performance 
traces from the computer as it ran each program, calcu¬ 
lated embedding parameter values as described in Sec¬ 
tion |II A[ and generated forecasts of each trace using 
ro-LMA and fnn-LMA. Figure 0] shows Pj vs. Cj plots for 
these forecasts. The diagonal structure on the top two 
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(a) A short segment of a trace of the instructions executed 
per cycle (IPC) during the execution of col_major 
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(b) A trace of the instructions executed per cycle (IPC) 
during the execution of 403. gcc 

FIG. 3: Time-series data from a computer performance 
experiment: processor load traces, in the form of 
instructions executed per cycle (IPC) of (a) a simple 
program (coljnajor) that repeatedly initializes a 256 x 
256 matrix and (b) a complex program (403.gcc) from 
the SPEC benchmark suite. Each point is the average 
IPC over a 100,000 instruction period. 


plots indicates that both ro-LMA and fnn-LMA performed 
well on the coljnajor traces. The MASE scores across 
all 15 trials in this set of experiments were 0.050 ± 0.002 
and 0.063 ± 0.003, respectively— i.e., ro-LMA’s accuracy 
was somewhat worse than that of fnn-LMA. For 403 .gcc, 
however, ro-LMA appears to be somewhat more accurate: 
1.488 ± 0.016 versus fnn-LMA’s 1.530 ± 0.021. Note that 
the 403. gcc MASE scores were higher for both fore¬ 
cast methods than on coljnajor, simply because the 
403. gcc signal contains less predictive structure 1 ^. This 
actually makes the comparison somewhat problematic, 
as discussed at more length on page [9] 

Table [i] summarizes results from all of the experiments 
presented so far. Overall, the results on the computer- 
performance data are consistent with those that we ob¬ 
tained with the Lorenz-96 example in the previous sec¬ 
tion: prediction accuracies of ro-LMA and fnn-LMA were 
quite similar on all traces, despite the former’s use of 
an incomplete embedding. This amounts to a validation 
of the conjecture on which this paper is based. And in 



(a) fnn-LMA on coljnajor 



(c) fnn-LMA on 403.gcc 



(b) ro-LMA on coljnajor 



0.5 1 1.5 2 2.5 



(d) ro-LMA on 403.gcc 


FIG. 4: Predicted (pj) versus true values (cj) for 
forecasts of coljnajor and 403.gcc generated with 
fnn-LMA and ro-LMA. 


TABLE I: MASE scores for both forecast methods and 
all four sets of experiments. 


Signal 


fnn-LMA 

ro-LMA 

Trials 

Lorenz-96 K 

= 22 

0.441 ± 0.033 

0.391 ±0.016 

330 

Lorenz-96 K 

= 47 

1.007 ±0.043 

0.985 ±0.047 

329 

coljnajor 


0.050 ±0.002 

0.063 ±0.003 

15 

403.gcc 


1.5297 ±0.0214 

1.488 ±0.016 

15 


both numerical and experimental examples, ro-LMA ac¬ 
tually outperformed fnn-LMA on the more-complex traces 
(403.gcc, K = 47). We believe that this is due to the 
noise mitigation that is naturally effected by a lower¬ 
dimensional embedding (cf., page [5]). Finally, we found 
that it was possible to improve the performance of both 
ro-LMA and fnn-LMA on all four of these dynamical sys¬ 
tems by adjusting the free parameter, r. It is to this issue 
that we turn next; following that, we address the issue of 
prediction horizon. 


IV. TIME SCALES, PARAMETERS, AND PREDICTION 
HORIZONS 

The r parameter 

The embedding theorems require only that r be greater 
than zero and not a multiple of any period of the dynam¬ 
ics. In practice, however, r can play a critical role in the 
success of delay-coordinate embedding—and any nonlin¬ 
ear time-series analysis that follows' 20 23 . It should not 
































































































be surprising, then, that r might affect the accuracy of 
an LMA-based method that uses the structure of an em¬ 
bedding to make forecasts. 

Figure [5] explores this effect in more detail. Across all r 
values, the MASE of col _maj or was generally lower than 
the other three experiments—again, simply because this 
time series has more predictive structure. The K = 22 
curve is generally lower than the K = 47 one for the same 
reason, as discussed at the end of the previous section. 
For the Lorenz-96 traces, prediction accuracy decreases 
monotonically with r. It is known that increasing r is 
beneficial for longer prediction horizonJS. Our situation 
involves short prediction horizons, so it makes sense that 
our observation is consistent with the contrapositive of 
that result. 

For the experimental traces, the relationship between 
r and MASE score is less simple. There is only a slight 
upward overall trend (not visible at this scale) and the 
curves are nonmonotonic. This latter effect is likely due 
to periodicities in the dynamics, which are very strong in 
the coljnajor signal (viz., a dominant unstable period- 
three orbit in the dynamics, which traces out the top, 
bottom, and middle bands in Figure [3]). Periodicities 
cause obvious problems for embeddings—and forecast 
methods that employ them— if the delay is a harmonic or 
subharmonic of those periods, simply because the coor¬ 
dinates of the delay vector are not independent samples 
of the dynamics. It is for this reason that Takens men¬ 
tions this condition in his original paper. Here, the effect 
of this is an oscillation in the forecast accuracy vs. r 
curve: low when it is a sub/harmonic of the dominant 
unstable periodic orbit in the coljnajor dynamics, for 
instance, then increasing with r as more independence is 
introduced into the coordinates, then falling again as r 
reaches the next sub/harmonic, and so on. 

This naturally leads to the issue of choosing a good 
value for the delay parameter. Recall that all of the ex¬ 
periments reported in Section m used a r value chosen 
at the first minimum of the mutual information curve 
for the corresponding time series. These values are indi¬ 
cated by the black vertical dashed lines in Figure [5j This 
estimation strategy was simply a starting point, chosen 
because it is arguably the most common heuristic used in 
the nonlinear time-series analysis community. As is clear 
from Figure [5] though, it is not the best way to choose 
r for reduced-order forecast strategies. Only in the case 
of coljnajor is that r value optimal for ro-LMA —that 
is, does it fall at the lowest point on the MASE vs. r 
curve. 

This is the point alluded to at the end of Sec¬ 
tion |III| one can improve the performance of ro-LMA 
simply by choosing a different r—i.e., by adjusting the 
one free parameter of the method. In all cases (aside 
from coljnajor, where the default r was the optimal 
value) adjusting r brought ro-LMA’s error down below 
fnn-LMA’s. The improvement can be quite striking: for 
visual comparison, Figure [6] shows ro-LMA forecasts of a 
K = 47 Lorenz-96 trace using default and best-case val- 
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(a) ro-LMA on Lorenz-96 with K = 22. 
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(b) ro-LMA on Lorenz-96 with K = 47. 
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(c) ro-LMA on coljnajor. 
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(d) ro-LMA on 403.gcc. 


FIG. 5: The effect of r on ro-LMA forecast accuracy. 
The blue dashed curves are the average MASE of the 
ro-LMA forecasts; the red dotted lines show ± the 
standard deviation. The black vertical dashed lines 
mark the r that is the first minimum of the mutual 
information curve for all of the time series. 
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ues of r. Again, this supports the main point of this pa¬ 
per: forecast methods based on incomplete embeddings 
of time-series data can be very effective—and much less 
work than those that require a full embedding. 



(a) A ro-LMA forecast (MASE = 0.985) using the 
“default” value of r for this trace, which is chosen at the 
first minimum of the average mutual information. 



(b) A ro-LMA forecast (MASE = 0.115) using the 
optimal value of r for this trace, which was chosen at the 
first minimum of the plot in Figure [5j)b). 


FIG. 6: Time-domain plots of ro-LMA forecasts of a 

K = 47 Lorenz-96 trace with default and best-case r 
values. 

However, that comparison is not really fair. Recall that 
the “full” embedding that is used by fnn-LMA, as defined 
so far, fixes r at the first minimum of the average mutual 
information for the corresponding trace. It may well be 
the case that that r value is suboptimal for that method 
as well -as it was for ro-LMA. To make the comparison 
fair, we performed an additional set of experiments to 
find the optimal r for fnn-LMA. Table [TT] shows the nu¬ 
merical values of the MASE scores, for forecasts made 
with default and best-case r values, for both methods 
and all traces. In the two simulated examples, best-case 
ro-LMA significantly outperformed best-case fnn-LMA; 
in the two experimental examples, the best-case fnn-LMA 
was better, but not by a huge margin. That is, even if one 
optimizes r individually for these two methods, ro-LMA 
keeps up with, and sometimes outperforms, fnn-LMA. 

In view of our claim that part of ro-LMA’s advantage 
stems from the natural noise mitigation effects of a low¬ 
dimensional embedding, it may appear somewhat odd 


that fnn-LMA works better on the experimental time- 
series data, which certainly contain noise. Comparisons 
of large MASE scores are somewhat problematic, how¬ 
ever. Recall that MASE > 1 means that the fore¬ 
cast is worse than an in-sample random-walk forecast of 
the same trace. That is, both LMA-based methods—no 
matter the r values—generate poor predictions. There 
could be a number of reasons for this poor perfor¬ 
mance. 403. gcc has almost no predictive structure 16 
and fnn-LMA’s extra axes may add to its ability to cap¬ 
ture that structure—in a manner that outweighs the po¬ 
tential noise effects of those extra axes. The dynamics 
of col_major, on the other hand, are fairly low dimen¬ 
sional and dominated by a single unstable periodic orbit; 
it could be that the “full” embedding of these dynamics 
captures its structure so well that fnn-LMA is basically 
perfect and ro-LMA cannot do any better. 


While the plots and MASE scores in this paper sug¬ 
gest that ro-LMA forecasts are quite good, it is important 
to note that both “default” and “best-case” r values were 
chosen after the fact in all of those experiments. This is 
not useful in practice. A large part of the point of ro-LMA 
is its ability to work ‘on the fly,’ when one may not have 
the leisure to run an average mutual information calcu¬ 
lation on a long segment of the trace and find a clear 
minumum—and one certainly cannot run a set of exper¬ 
iments like the ones that produced Figure [5] and choose 
an optimal r. (Producing this Figure required 3,000 runs 
involving a total of 22,010,700 forecasted points, which 
took approximately 44.5 hours on an Intel Core i7.) 


We suspect, though, that it is possible to estimate 
good— -although certainly not optimal—values for r, and 
to do so quickly and automatically. Most of the current 
r-estimation methods use some kind of distribution. We 
are exploring how to adapt them to be used in a “rolling” 
fashion, on an evolving distribution that is built up on 
the fly, as the data arrive. Newer variations on mutual 
information may be more appropriate in this situation, 
such as co-informatioif® and multi-information®. An 
appealing alternative is a time-lagged version of permu¬ 
tation entropj!®. We are also exploring geometric and 
topological heuristics from the nonlinear time-series anal¬ 
ysis literature, such as wavering producfill factor , in¬ 
tegral local deformatior i®, and displacement from diag¬ 
onal^. Many of these methods, however, are aimed at 
producing reconstructions from which one can accurately 
estimate dynamical invariants; as mentioned above, the 
optimal r for forecasting may be quite different. More¬ 
over, any method that works with the geometry or topol¬ 
ogy of the dynamics—rather than a distribution of scalar 
values—may be less able to work with short samples of 
that structure. Nonetheless, these methods may be a 
useful complement to the statistical methods mentioned 
above. 
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TABLE II: The effects of the r parameter. The “default” value is fixed, for both ro-LMA and fnn-LMA, at the first 
minimum of the average mutual information for that trace; the “best case” value is chosen individually, for each 

method and each trace, from plots like the ones in Figure [5j 


Signal ro-LMA ro-LMA fnn-LMA fnn-LMA 

(default r) (best-case t) (default r) (best-case r) 
Lorenz-96 K = 22 0.391 ± 0.016 0.073 ± 0.002 0.441 ± 0.033 0.137 ± 0.006 

Lorenz-96 K = 47 0.985 ± 0.047 0.115 ± 0.006 1.007 ± 0.043 0.325 ± 0.020 

col jnaj or 0.063 ± 0.003 0.063 ± 0.003 0.050 ±0.002 0.049 ± 0.002 

403.gcc 1.488 ±0.016 1.471 ± 0.014 1.530 ±0.021 1.239 ±0.020 


Prediction horizon 


There are fundamental limits on the prediction of 
chaotic systems. Positive Lyapunov exponents make 
long-term forecasts a difficult prospect beyond a cer- 
tain point for even the most-sophisticated methodd^lHS]. 
Note that the coordinates of points in higher-dimensional 
embedding spaces sample wider temporal spans of the 
time series. This increased ‘memory’ means that they 
capture more information about the dynamics than 
lower-dimensional embeddings do, which raises an im¬ 
portant concern about ro-LMA: whether its accuracy will 
degrade more rapidly with increasing prediction horizon 
than that of fnn-LMA. 

Recall that the formulations of both methods, as de¬ 
scribed and deployed in the previous sections, assume 
that measurements of the target system are available in 
real time: they “rebuild” the LMA models after each 
step, adding new time-series points to the embeddings 
as they arrive. Both ro-LMA and fnn-LMA can easily 
be modified to produce longer forecasts, however—say, 
h steps at a time, only updating the model with new 
observations at h- step intervals. Naturally, one would 
expect forecast accuracy to suffer as h increases for any 
non-constant signal. The question at issue in this section 
is whether the greater temporal span of the data points 
used by fnn-LMA mitigates that degradation, and to what 
extent. 

Assessing the accuracy of h- step forecasts requires 
a minor modification to the MASE calculation, since 
its denominator is normalized by one-step random-walk 
forecast errors. In an h- step situation, one should instead 
normalize by ft.-step forecasts, which involves modifying 
the scaling term to be the average root-mean-squared er¬ 
ror accumulated using random-walk forecasting on the 
initial training signal, h steps at a time-22. This gives a 
new figure of merit that we will call h-MASE: 


fc+n+1 

h-MASE = 

f=n+l 


_ \Pj~Cj |_ 

k sr^n / SlLi {xj-x i+L ) 2 
n—h A-^i—1 V h 


Note that the h- step forecast accuracy of the random- 
walk method will also degrade with h, so h-MASE will 


always be lower than MASE. This also means that h- 
MASE scores should not be compared for different h. 

In Table EH we provide h-MASE scores for h- step 
forecasts of the different sets of experiments from Sec¬ 
tion |III| The important comparisons here are, as men¬ 
tioned above, across the rows of the table. The different 
methods “reach” different distances back into the time se¬ 
ries to build the models that produce those forecasts, of 
course, depending on their delay and dimension. At first 
glance, this might appear to make it hard to compare, 
say, default-r ro-LMA and best-case-r fnn-LMA sensibly, 
since they use different rs and different values of the em¬ 
bedding dimension and thus are spanning a longer range 
of the time series. However, h is measured in units of 
the sample interval of the time series, so comparing one 
h- step forecast to another (for the same h) does make 
sense. 

There are a number of interesting questions to ask 
about the patterns in this table, beginning with the 
one that set off these experiments: how do fnn-LMA 
and ro-LMA compare if one individually optimizes r for 
each method? The numbers indicate that ro-LMA beats 
fnn-LMA for h = 1 on the I\ = 22 traces, but then 
loses progressively badly (i.e., by more cts) as h grows, 
coljnajor follows the same pattern except that ro-LMA 
is worse even at h = 1. For 403.gcc, fnn-LMA performs 
better at both ts and all values of h , but the dispar¬ 
ity between the accuracy of the two methods does not 
systematically worsen with increasing h. For K = 47, 
ro-LMA consistently beats fnn-LMA for both rs for h < 10 
but the accuracy of the two methods is comparable for 
longer prediction horizons. 

Another interesting question is whether the assertions 
in the previous section stand up to increasing prediction 
horizon. Those assertions were based on the results that 
appear in the h = 1 rows of Table |ITT| ro-LMA was better 
than fnn-LMA on the K = 22 Lorenz-96 experiments, 
for instance, for both r values. This pattern does not 
persist for longer prediction horizons: rather, fnn-LMA 
generally outperforms ro-LMA on the K = 22 traces for 
h = 10,50, and 100. The h = 1 comparisons for K = 47 
and col_major do generally persist for higher h, however. 
As mentioned before, 403.gcc is problematic because its 
MASE scores are so high, but the accuracies of the two 
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TABLE III: 
explained in the 


The ft-step mean absolute scaled error ( h-MASE ) scores for different forecast horizons (ft). As 
text, h-MASE scores should not be compared for different h (i.e., down the columns of this table). 


Signal 


h 

ro 

-LMA 

ro 

-LMA 

fnn-LMA 

fnn-LMA 




(default t) 

(best- 

-case r) 

(default r) 

(best- 

-case t) 

Lorenz-96 K = 

22 

i 

0.391 

± 

0.016 

0.073 

± 

0.002 

0.441 

± 

0.003 

0.137 

± 

0.006 

Lorenz-96 K = 

22 

10 

0.101 

± 

0.008 

0.066 

± 

0.003 

0.062 

± 

0.011 

0.033 

± 

0.002 

Lorenz-96 K = 

22 

50 

0.084 

± 

0.007 

0.074 

± 

0.008 

0.005 

± 

0.002 

0.004 

± 

0.001 

Lorenz-96 K = 

22 

100 

0.057 

± 

0.005 

0.050 

± 

0.004 

0.003 

± 

0.001 

0.003 

± 

0.001 

Lorenz-96 K = 

47 

1 

0.985 

± 

0.047 

0.115 

± 

0.006 

0.995 

± 

0.053 

0.325 

± 

0.020 

Lorenz-96 K = 

47 

10 

0.223 

± 

0.011 

0.116 

± 

0.005 

0.488 

± 

0.042 

0.218 

± 

0.012 

Lorenz-96 K = 

47 

50 

0.117 

± 

0.011 

0.112 

± 

0.010 

0.127 

± 

0.011 

0.119 

± 

0.010 

Lorenz-96 K = 

47 

100 

0.075 

± 

0.006 

0.068 

± 

0.005 

0.079 

± 

0.005 

0.075 

± 

0.004 

coljnajor 


1 

0.063 

± 

0.003 

0.063 

± 

0.003 

0.050 

± 

0.002 

0.049 

± 

0.002 

colnnajor 


10 

0.054 

± 

0.006 

0.046 

± 

0.003 

0.021 

± 

0.001 

0.018 

± 

0.001 

coljnajor 


50 

0.059 

± 

0.009 

0.037 

± 

0.003 

0.012 

± 

0.003 

0.009 

± 

0.001 

coljnajor 


100 

0.044 

± 

0.004 

0.028 

± 

0.006 

0.010 

± 

0.003 

0.007 

± 

0.001 

403.gcc 


1 

1.488 

± 

0.016 

1.471 

± 

0.014 

1.530 

± 

0.021 

1.239 

± 

0.020 

403.gcc 


10 

0.403 

± 

0.009 

0.396 

± 

0.009 

0.384 

± 

0.007 

0.369 

± 

0.010 

403.gcc 


50 

0.154 

± 

0.003 

0.151 

± 

0.005 

0.143 

± 

0.003 

0.141 

± 

0.003 

403.gcc 


100 

0.101 

± 

0.002 

0.101 

± 

0.003 

0.095 

± 

0.002 

0.093 

± 

0.002 


methods are similar for all ft, > 1. 

The fact that fnn-LMA generally outperforms ro-LMA 
for longer prediction horizons makes sense, simply be¬ 
cause ro-LMA samples less of the time series and there¬ 
fore has less ‘memory’ about the dynamics. Still, 
best-case ro-LMA performs almost as well as best-case 
fnn-LMA in many cases, even for ft = 100. In view 
of the fundamental limits on prediction of chaotic dy¬ 
namics, however, it is worth considering whether ei¬ 
ther method is really making correct long-term forecasts. 
Indeed, time-domain plots of long-term forecasts (e.g., 
Figure [ 7 ]) reveal that both fnn-LMA and ro-LMA fore¬ 
casts have fallen off the true trajectory and onto shadow 
trajectories—a well-known phenomenon when forecast¬ 
ing chaotic dynamic^--®. 

In other words, it appears that even a 50-step forecast 
of these chaotic trajectories is a tall order: i.e., that we 
are running up against the fundamental bounds imposed 
by the Lyapunov exponents. In view of this, it is promis¬ 
ing that ro-LMA generally keeps up with fnn-LMA in many 
cases—even when both methods are struggling with the 
prediction horizon, and even though the ro-LMA model 
has much less memory about the past history of the tra¬ 
jectory. An important aspect of our future research on 
this topic will be determining bounds on reasonable pre¬ 
diction horizons—as well as developing methods, if pos¬ 
sible, to increase prediction horizon without sacrificing 
the accuracy or speed of ro-LMA. 


V. CONCLUSION 

We have proposed a novel nonlinear forecast strategy 
that works in a two-dimensional version of the delay- 
coordinate embedding space. Our preliminary results 
suggest that this approach captures the dynamics well 
enough to enable effective prediction of several very dif¬ 
ferent real-world and artificial dynamical systems, even 
though working with a 2D embedding violates one of the 
most critical basic tenets of the delay-coordinate embed¬ 
ding machinery. 

The point of this paper is not only to explore whether 
prediction in projection works, but also to establish it as 
a useful practical technique. From that standpoint, the 
primary advantage of ro-LMA is that the 2D embedding 
that it uses to model the dynamics has only a single free 
parameter: the delay, r. The standard delay-coordinate 
embedding process has a second free parameter (the di¬ 
mension) that requires expert human judgment to esti¬ 
mate, making that class of methods all but useless for 
adaptive modeling and forecasting. As described at the 
end of Section |IV[ we believe that good values for r can 
be effectively estimated on the fly from the time series. 
This will let our method adapt to nonstationary dynam¬ 
ics. In this fashion, the line of research described in this 
paper bridges the gap between rigorous nonlinear mathe¬ 
matical models- which are ineffective in real-time—and 
approximate methods that are agile enough for adaptive 
modeling of nonstationary dynamical processes. 

Data length is an important consideration in any 
nonlinear time-series application. Traditional estimates 
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FIG. 7: A best-case-r ro-LMA forecast of a K = 47 Lorenz-96 trace for h = 50. The forecast (red) follows the true 
trajectory (blue) for a while, then falls off onto a shadow trajectory, then gets recorrected when a new set of 
observations are incorporated into the model after h time steps. 


{e.g., by Smittl^ 7 and by Tsonis et a/.®) would suggest 
that « 10 1 ' data points would be required for a success¬ 
ful delay-coordi nate e mbedding for the Lorenz-96 K = 47 
data in Section III A where the known (Iky valuers) in¬ 
dicate that one might need at least m = 38 dimensions 
to properly unfold the dynamics. While it may be possi¬ 
ble to collect that much data in a synthetic experiment, 
that is certainly not an option in a real-world forecasting 
situation, particularly if the dynamics that one wants 
to forecast are nonstationary. Note, however, that we 
were able to get good results on that system with only 
45,000 points. The Smith/Tsonis estimates were derived 
for the specific purposes of correlation dimension calcu¬ 
lations via the Grassberger-Procaccia algorithm; in our 
opinion, they are overly pessimistic for forecasting. For 
example, Sauei 10 ^ successfully forecasted the continuation 
of a 16,000-point time series embedded in 16 dimensions; 
Sugihara & Majf 11 used delay-coordinate embedding with 
m as large as seven to successfully forecast biological and 
epidemiological time-series data as short as 266 points. 
Given these results, we believe that our traces are long 
enough to support the conclusions that we drew from 
them: viz., fnn-LMA is a reasonable point of compari¬ 
son, and the fact that ro-LMA outperforms it is meaning¬ 
ful. Nonetheless, an important future-work item will be 
a careful study of the effects of data length on ro-LMA, 
which will have profound implications for its ability to 
handle nonstationarity. 


As stated before, no forecast model is ideal for all noise- 
free deterministic signals, let alone all real-world time- 
series data sets. However, the proof of concept offered 
in this paper is encouraging: prediction in projection ap¬ 
pears to work remarkably well, even though the mod¬ 
els that it uses are not topologically faithful to the true 
dynamics. As mentioned in the Introduction, there are 
many other creative and effective ways to leverage the 
structure of an embedded dynamics in order to predict 


the future course of a trajectory (e.g. 121SHH1) it would 
be interesting to see how well these methods work in a 
reduced-order embedding space. Our ultimate goal is to 
be able to show that prediction in projection—a simple 
yet powerful reduction of a time-tested method- has real 
practical utility for a wide spectrum of forecasting tasks 
as a simple, agile, adaptive, noise-resilient, forecasting 
strategy for nonlinear systems. 
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